Skip to content

feat: maximum Yandex Dialogs platform integration (Phases 0–2)#18

Merged
trudenboy merged 8 commits intodevfrom
feat/platform-integration
May 7, 2026
Merged

feat: maximum Yandex Dialogs platform integration (Phases 0–2)#18
trudenboy merged 8 commits intodevfrom
feat/platform-integration

Conversation

@trudenboy
Copy link
Copy Markdown
Owner

Summary

Six commits delivering the four-phase plan from docs/NLU_RESEARCH.md: consume the rest of the Yandex Dialogs request envelope, polish responses for screened surfaces, delegate intent classification to the platform via custom grammar, and harden the webhook against inner-dispatch exceptions. Zero new runtime dependencies on this side; bumps ya-dialogs-api>=2.1.0 (released to PyPI separately) for the new intent CRUD surface.

# Commit What
1 3a13d0f Phase 0 — read meta.interfaces (gate buttons on screen), request.markup.dangerous_context (graceful refusal), request.nlu.entities[YANDEX.NUMBER] (new volume_relative action: «прибавь на 20» / «убавь 5» / «на 15 громче»), log request.original_utterance for misclassification post-mortems
2 f052e6f Phase 1card parameter plumbing, suggestion buttons (Следующая / Пауза / Громче / Тише) on play/control success on screened surfaces, provider/tts_dictionary.py with ~40 foreign band-name transliterations (single-word + multi-word phrases), voice_continuation opt-in toggle (end_session=False after success)
3 fdb24a8 Phase 2 — eleven custom intent grammars (control.{pause,resume,next,previous,stop,volume_up,volume_down,shuffle_on,shuffle_off,now_playing} + play.my_wave) declared on the skill via the new ya-dialogs-api 2.1.0 IntentDraft API; runtime dispatcher reads request.nlu.intents first, regex parsers remain as fallback for the long tail. BigImage card emission deferred — needs separate image-upload infrastructure
4 745291f Phase 2 follow-up — handle built-in YANDEX.REJECT (cancel pending) and YANDEX.HELP (contextual hint) in disambiguation/slot-elicit flows; CONFIRM and REPEAT deferred
5 a2e7e12 Docs — root CLAUDE.md aligned with upstream Music Assistant CLAUDE.md (Sphinx docstrings, sync workflow, network-input validation, debugging); converted six existing Google-style docstrings to Sphinx
6 d2ce60a Fix — wrap post-auth dispatch in try / except so a parser/resolver/dispatch raise surfaces as a Russian fallback («Что-то пошло не так. Попробуй ещё раз.») instead of HTTP 500 → Alice silence (per @chrisuthe review on upstream #3843)

Resolves all four review threads from upstream music-assistant/server#3843 (two Copilot bot, two from @chrisuthe).

Test plan

  • pytest tests/ — 461 passed (was 411 pre-branch)
  • ruff check provider/ tests/ — clean
  • ruff format --check — clean
  • mypy provider/ — clean (pre-existing plugin.py:33 warning unrelated)
  • Live test: trigger one of the new grammar intents («следующая», «громче», «мою волну») on a real Yandex Station — requires one auto-moderation cycle (minutes to hours) for the grammar PATCH to land
  • Live test: relative-volume phrasing («прибавь на двадцать») bumps player volume by 20
  • Live test: dangerous-content phrase short-circuits with a generic refusal
  • Live test: «отмена» during disambiguation prompt clears pending state and ends session
  • Live test: induce a webhook-handler error and verify Alice gets the Russian fallback (not silence)

🤖 Generated with Claude Code

trudenboy and others added 6 commits May 7, 2026 22:24
Wires four free-tier Yandex Dialogs platform features into the webhook
without adding any external dependency:

- request.markup.dangerous_context: graceful refusal with end_session=true
  before NLU/search engages, so flagged content never lands in
  mass.music.search.
- meta.interfaces.screen: buttons in the disambiguation prompt are emitted
  only on screened surfaces; voice-only devices (Mini/Pro) get the same
  ordinal prompt without button payload.
- request.original_utterance: logged alongside the normalised command for
  misclassification post-mortems.
- request.nlu.entities[YANDEX.NUMBER]: new ParsedControl action
  "volume_relative" handles "прибавь на 20" / "убавь 5" / "на 15 громче"
  with regex-captured digits or entity fallback. Executor reads current
  player.volume_level, applies signed delta, clamps [0, 100], dispatches
  cmd_volume_set.

Bare "прибавь" / "убавь" without a number still resolve to volume_up /
volume_down via the existing _CONTROL_PATTERNS rules. Player resolution
and music search remain in-house (domain logic, not NLU-shaped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three platform-aware response improvements landing without new
dependencies, all gated on meta.interfaces.screen so voice-only
surfaces (Mini, Pro) are unaffected:

- card parameter plumbed through _yandex_response (BigImage / ItemsList /
  ImageGallery shapes documented). Actual emission deferred to Phase 1.5
  — BigImage requires image_id of a pre-uploaded asset and per-track
  album art can't be uploaded inside the 3-second webhook budget.

- Suggestion buttons (Следующая / Пауза / Громче / Тише) appended to
  play- and control-success responses on screened surfaces. Lets the
  user follow up by tap without re-saying the activation phrase.

- TTS dictionary moved to provider/tts_dictionary.py with two tables:
  WORD_REPLACEMENTS (Russian stress hints + ~26 foreign single-word
  artist transliterations) and PHRASE_REPLACEMENTS (16 multi-word
  bands, applied before the per-word regex). _tts_for now matches both
  Latin and Cyrillic so "Включаю Metallica" emits tts="Включ+аю
  Мет+аллика". text stays clean.

- voice_continuation toggle (CONF_DIALOG_VOICE_CONTINUATION, default
  off): when enabled, play- and control-success keep end_session=False
  for natural follow-ups. "стоп / выключи / выключи музыку" always
  closes the session regardless of the toggle. No UI surface yet —
  power-user knob for now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t.nlu.intents (Phase 2)

Delegates intent classification to Yandex's grammar engine for the closed
set of control commands and the my_wave play intent. The platform's
synchronously-classified `request.nlu.intents.<form_name>` block now takes
precedence in the webhook handler; the existing regex parsers
(parse_command / parse_control) remain as fallback for phrases that
don't match any declared grammar — so this is purely additive coverage,
no regression risk.

Eleven grammars ship in provider/dialogs_grammar.py:
  control.{pause,resume,next,previous,stop,volume_up,volume_down,
           shuffle_on,shuffle_off,now_playing}
  play.my_wave

Each carries `positiveTests` for the dev-console "Протестировать" button
and uses %lemma where multi-word morphology matters. All grammar bodies
are conservative — Yandex's server-side validator catches malformed
sources synchronously and surfaces them as DialogsIntentValidationError,
so set_intents() will fail loud rather than silently deploying broken
NLU.

The intents pipeline runs between draft update and request_deploy, so
they land in the same moderation cycle as the rest of the draft (no
two-phase publish needed). Endpoints + payload shape were derived from
a Playwright probe of the live dev console on 2026-05-07; the
ya-dialogs-api>=2.1.0 dependency wraps the five new REST endpoints.

Bumps ya-dialogs-api>=2.1.0 in pyproject.toml + manifest.json.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yandex automatically classifies four built-in intents (CONFIRM / REJECT /
HELP / REPEAT) once any custom grammar is declared on the skill — and we
declared eleven in the previous Phase 2 commit. This wires up the two
that have unambiguous behaviour in our flows:

- YANDEX.REJECT during a pending disambiguation prompt or slot-elicit
  ("На какой колонке?" / "Что включить?") → respond "Хорошо, отменил.",
  clear pending state, end session. Outside of those prompts the intent
  falls through to normal command parsing — "отмена" without context is
  not a free-standing app-cancel signal.
- YANDEX.HELP → contextual hint matching the current prompt: re-explain
  how to answer the disambiguation, suggest example queries during slot
  elicit, or surface a generic "включи рок на кухне" example otherwise.
  State is preserved so the user can answer the original question next.

CONFIRM and REPEAT are deferred:
- CONFIRM is ambiguous in our flows (which player is the user agreeing
  to? — we have no canonical "yes" target).
- REPEAT requires caching the last response on session_state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream Music Assistant CLAUDE.md mandates Sphinx-style docstrings
(`:param:` syntax) and explicitly bans Google-style (`Args:`) and
bullet-style (`- param:`) — flagged in PR #3843 review by @chrisuthe.

This commit:

1. Adds a CLAUDE.md at the repo root that mirrors upstream's relevant
   sections (Behaviour, Code Style, Branching) and adapts the rest to
   this provider repo's specifics (sync workflow, provider/ layout,
   pre-commit gate, debugging via $HOME/.musicassistant). Cross-refs
   the Copilot review findings (is_public_https_url for any new
   network input) so future contributors don't re-introduce the same
   bug.

2. Converts the six Google-style docstring sections that had crept in
   (auth_page.py, auth_session.py, dialog_skill_meta.py, dialogs.py,
   dialogs_control.py, dialogs_nlu.py) to Sphinx-style. No behaviour
   change.

The webhook-handler error-handling concern from the same review thread
is mentioned in CLAUDE.md as a known follow-up but not addressed here —
that's a separate code change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per @chrisuthe's review of upstream PR #3843: only `request.json()` was
guarded; an unexpected raise from a parser, the resolver, or MA dispatch
bubbled to aiohttp → HTTP 500 → Alice silence on the user's device.

Refactors `_handle_webhook` so the post-auth body lives in a new
`_handle_authenticated_request` method, called inside a `try / except`
that catches any non-CancelledError exception, logs it, and returns a
generic Russian fallback ("Что-то пошло не так. Попробуй ещё раз.")
with `end_session=False` so the conversation can continue. The original
exception is still emitted via `_logger.exception` so operators can
debug from `$HOME/.musicassistant/musicassistant.log`.

Adds a regression test that injects a RuntimeError into
`mass.players.all_players` (deep inside the play-resolve path) and
verifies the response is HTTP 200 with the Russian fallback text and
`end_session=false` — not HTTP 500.

CLAUDE.md updated to call out the contract so future branches don't
regress it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'sting' entry in tts_dictionary.py is the artist Стинг, not a typo
of 'string'. Codespell flagged it on the PR #18 CI run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the Yandex Dialogs integration for the Music Assistant Alice provider by consuming more of the request envelope, leveraging platform-side intent classification via custom grammars, improving TTS pronunciation, adding screen-only UX affordances, and hardening the webhook to avoid Alice “silence” on unexpected exceptions.

Changes:

  • Webhook handler now supports screened-surface UI gating (buttons), dangerous-context refusal, built-in YANDEX intents, platform request.nlu.intents precedence, and a post-auth try/except fallback response.
  • Control layer adds volume_relative parsing (regex + YANDEX.NUMBER entity fallback) and execution logic (read current volume, apply delta, clamp).
  • Skill provisioning now declares custom grammars via ya-dialogs-api>=2.1.0; TTS substitutions moved into a dedicated dictionary module; tests expanded accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_dialogs.py Adds coverage for webhook exception fallback, screen button gating, platform-intent dispatch precedence, built-in YANDEX intents, voice continuation, and TTS transliteration/phrase handling.
tests/test_dialogs_control.py Adds parsing + execution tests for volume_relative (regex + entity fallback, clamping).
pyproject.toml Bumps ya-dialogs-api to >=2.1.0.
provider/tts_dictionary.py Introduces curated word/phrase replacement tables for TTS stress marks and foreign-name transliteration.
provider/plugin.py Plumbs voice_continuation config into the webhook handler.
provider/manifest.json Updates runtime requirement to ya-dialogs-api>=2.1.0.
provider/dialogs.py Implements screen detection, suggestion buttons, dangerous-context refusal, platform intent mapping, built-in YANDEX intents, TTS phrase/word passes, and graceful exception fallback.
provider/dialogs_nlu.py Docstring format updates (:returns:).
provider/dialogs_grammar.py Adds skill grammar definitions and runtime mapping from request.nlu.intents to internal command/control types.
provider/dialogs_control.py Adds volume_relative parsing (incl. YANDEX.NUMBER fallback) + execution and docstring format updates.
provider/dialog_skill_meta.py Docstring format update (:raises:).
provider/constants.py Adds CONF_DIALOG_VOICE_CONTINUATION and related commentary.
provider/auto_update.py Supplies intents from build_grammar() during skill update pipeline.
provider/auto_create.py Supplies intents from build_grammar() during skill creation pipeline.
provider/auth_session.py Docstring format update (:raises:).
provider/auth_page.py Docstring format update (:raises:).
CLAUDE.md Adds aligned contributor guidance (commands, docstrings, validation, handler error-handling guarantees).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread provider/dialogs.py Outdated
Comment thread provider/dialogs_control.py Outdated
Comment thread provider/constants.py Outdated
Three review threads from Copilot's pass on PR #18:

- **dialogs.py:580** — DEBUG "Webhook recv" line was emitting `cmd` and
  `original_utterance` *before* the dangerous_context refusal branch,
  leaking flagged content into $HOME/.musicassistant/musicassistant.log.
  Both fields are now replaced with `<redacted: dangerous_context>`
  when the flag is set; the rest of the structured fields stay intact
  so operators still see traffic shape. Regression test injects a
  flagged phrase and asserts it's absent from caplog records.
- **dialogs_control.py:318** — `volume_relative` clamped magnitude with
  `max(1, …)`, silently promoting "прибавь на 0" to a +1 bump. Clamp
  is now `max(0, …)` so the parsed delta matches the spoken number;
  zero is a valid no-op. Parametrised test covers all four phrasings.
- **constants.py:119** — comment promised "спасибо" closes the session
  via the `stop` control intent, but parse_control does not match it.
  Corrected to the actually-matched phrases (стоп / останови / выключи
  / выключи музыку). Pure doc fix.

Plus: pin `ya-dialogs-api==2.1.0` in provider/manifest.json (== rather
than >=) so MA installs the exact version the provider was tested
against.

Bumps VERSION to 1.3.0 with a comprehensive CHANGELOG entry covering
the full Phase 0–2 work landing in PR #18 plus these three Copilot
fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trudenboy trudenboy merged commit 104f953 into dev May 7, 2026
5 checks passed
@trudenboy trudenboy deleted the feat/platform-integration branch May 7, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants